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Abstract 

In 1938, Morse and Hedlund proved that the subword complexity function of 
a two-sided infinite word is either bounded or at least linearly growing. In 1982, 
Ehrenfeucht and Rozenberg proved that this gap property holds for the subword 
complexity function of any language. Their result was then sharpened in 2005 by 
Balogh and Bollobas. The aim of the present paper is to present a self-contained, 
compact proof of Ehrenfeucht and Rozenberg's result. 

1 Notation and definitions 

The semiring of natural integers is denoted N. The ring of rational integers is denoted Z. 

Words and languages. Throughout this paper A denotes a finite set of symbols and is 
called the alphabet. A word over A is a finite string of elements of A. The set of all words 
over A is denoted A*. A language over A is a subset of A*. For every w G A*, the length 
of w is denoted \w\. For each n G N, the set of all n-Iength words over A is denoted A n . 
The word of length zero is called the empty word. The set of all non-empty words over A 
is denoted A + . Word concatenation is denoted multiplicatively. 

Factor and complexity function. Let x G A*, y G A*, and L C A*. We say that x is 
a prefix (resp. a suffix) of y if there exists w G A* such that y = xw (resp. y = wx). We 
say that x is a factor of y if there exist w, w' G A* such that y = wxw'. By extension, 
we say that a; is a factor of L if x is a factor of some word in L. For each n G N, F n (L) 
denotes the set of all n-length factors of L. The function from N to itself that maps each 
n G N to the cardinality of F n (L) is called the complexity function of L. Note that the 
complexity function of any non-empty language maps to 1 because the empty word is a 
factor of every word. Note also that a language has the same complexity function as its 
set of factors. 
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Infinite and bi-infinite words. A (right-) infinite word over A is a function from N to 
A. A bi-infinite word over A is a function from Z to A. Let u be an infinite or bi-infinite 
word over A. We say that a; is a factor of u if there exists i in the domain of u such that 
x = u{i)u{i + l)u{i + 2) ■ ■ • u(i + \x\ — 1). The set of all factors of u is called the language 
of u. The complexity function of u is defined as the complexity function of its language. 

2 The Morse- Hedlund complexity gap 

The aim of this paper is to present a self-contained, compact proof of: 

Theorem 1 (Ehrenfeucht and Rozenberg, 1982 |4J). Let p be the complexity function of 
some language. Either p(n) is greater than n for every n£N, or p is bounded from above. 

For instance, it follows from Theorem [T] that no complexity function grows like ^/n. 
However, the lower bound is trivially achievable. 

Example 1. Consider the language U = {a l b J e N}. For each n G M, it is clear that 
F n (U) = {a. n ~ k b k : k — 0, 1, 2, . . . , n}, so the complexity function of U maps n to n + 1. 

In addition to proving Theorem [TJ Ehrenfeucht and Rozenberg described the class of 
those languages with bounded complexity functions: 

Theorem 2 (Ehrenfeucht and Rozenberg, 1982 [4]). Let L be a language. The complexity 
function of L is bounded from above if, and only if, there exists a finite subset T C A* x 
A* x A* such that 

LC |J {xy n z : n e N} . 

(x,y,z)eT 

Before proving Theorem [1] in Section [3l let us state the various related results that can 
be found in the literature. Everything starts with Morse and Hedlund's celebrated papers 
[B] and [7]. They were published in 1938 and 1940, respectively. 

Definition 1. We say that a function p: N — > N is FIATC (first increasing and then 
constant) if there exists m G N such that 

• p(0) < < p(2) < ■ • ■ < p(m) and 

• p(m + n) = p(m) /or euen/ n G N. 

Theorem 3 (More and Hedlund, 1938 [EJE])- Let u be a bi-infinite word and let p denote 
the complexity function of u. 

• If u is not periodic then p is increasing. 

• If u is periodic then p is FIATC and sup ngN p(n) is the least period of u. 
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On the one hand, FIATC functions are clearly bounded. On the other hand, observe 
that any increasing function p: N — > N satisfies p(n) > n + p(0) for every n G N. Hence, 
Theorem [3] implies that Theorem [TJ holds for the particular case where p is the complexity 
function of a bi-infinite word. Again, the lower bound is achievable: 

Example 2. Consider the bi-infinite word u over {a, b} given by: u(i) = b for every i G N 
and u(—i) = a for every i G N \ {0}. The language of u equals U, where U is as in 
Example [TJ Therefore, the complexity function of u maps n to n + 1 for each n G N. 

As illustrated with the following two examples, bounded complexity functions are not 
necessarily FIATC and unbounded complexity functions are not necessarily increasing: 

Example 3. Let p denote the complexity function of L = |ba 2fe b : k G N}. For each 
integer n > 1, let X n = {a n , ba n_1 , a n_1 b}. If n is an odd positive integer then F n (L) = X n 
and thus p(n) =3. If ri is an even positive integer then F n (L) = X n U {ba ra_2 b} and thus 
p{n) = 4. 

Example 4. Let U be as in Example [TJ Let p denote the complexity function of L = 
U U {ab 2fc a : k G N} U {ba 2fc b : k G N}. If n = 2 or if n is an odd positive integer then 
F n (L) = F n (U) and thus p(n) — n + 1. If n is an even integer greater than or equal to 4 
then F n (L) = F n {U) U {ab"- 2 a, ba n ~ 2 b} and thus p{n) = n + 3. 

The most famous variant of Theorems [TJ/TS] and [3] is: 

Theorem 4 ([3], see also [21E1IH])- Let u be an infinite word and letp denote the complexity 
function of u. 

• If u is not eventually periodic then p is increasing. 

• If u is eventually periodic then p is FIATC and the period of u is not greater than 
sup neN p(n). 

An infinite word is called Sturmian if its complexity function maps each n G N to 
n + 1: by Theorem HJ Sturmian words are those non-eventually-periodic infinite words 
with minimum complexity. There is no trivial example of Sturmian word. The study of 
Sturmian words was initiated by Morse and Hedlund in 1940 [7j. It is still an active field 
of research [5J [S] • 

The next result is the latest improvement of Theorem [lj 

Theorem 5 (Balogh and Bollobas, 2005 [TJ). Let <fi be the function that maps each real 



number x to 



x + 1 



x + 1 



• Let p be the complexity function of some language and let m G N. If p(m) < m then 
p(n + p(m) + m) < <f)(p(m)) for every n G N. 

• For each k G N ; there exists a function pj, : N — > N such that pk is the com- 
plexity function of some binary language and both sets {n G N : Pk{n) = k} and 
{fiGN: Pk{n) = 4>{k)} are infinite. 

The second part of Theorem ensures that the function (p is optimal. 
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3 The proof of Theorem H 



In what follows, L denotes a language over the alphabet A and p denotes the complexity 
function of L. 

Definition 2 (Special factor). We say that a word w G A* is a special factor of the 
language L if there exist a, b G A with a ^ b such that both wa and wb are factors of L. 

Let p: A + — ► A* be the function mapping each non-empty word w G A + to its (|w| — 1)- 
length prefix: for each w G A + , there exists a G A such that w = p(w)a. For any language 
L O A* and any n G N, p maps each word in F n+ i(L) to a word in F n (L). A word tcGi* 
is a special factor of L if, and only, if p maps to w more than one factor of L. It follows 
that L admits an n-length special factor if, and only if, p is not injective on F n+ i(L). 

Lemma 1. If a language only admits finitely many special factors then its complexity 
function is eventually constant. 

Proof. Assume that L only admits finitely many special factors. Let n G N be such 
that L does not admit any n-length special factor. Then, p induces an injection from 
F n+ x(L) into F n (L). Inequality pin + 1) < pin) follows. Therefore, p is non-increasing 
on {n G N : n > m}, where m G N is denotes the maximum length of a special factor of 



Note that the converse of Lemma [T] does not hold in general: 

Example 5. Consider the case where L = {a fe b : k G N}. The complexity function of L 
is eventually constant: for every integer n > 1, F n (L) = {a n , a™ _1 b} and thus p(n) = 2. 
However, L admits infinitely many special factors: for every n G N, a n is a special factor 
of L. 

Exercise 1. Prove that if the language of an infinite word admits only finitely many special 
factors then this infinite word is eventually periodic. 

Exercise 2. Prove that if the language L only admits finitely many special factors then 
there exists a finite subset T C A* x A* such that 



Definition 3. We say that the language L is (right-)extendable if for each w G L, there 
exists a G A such that wa G L. 

Example 6. The language of any infinite or bi-infinite word is extendable. 

Lemma 2. If a language admits infinitely many special factors then its complexity function 
is increasing. 



L. 



□ 




(x,y)£T 
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Proof. Let n G N. If L is extendable then p induces a surjection from F n+1 (L) onto F n (L). 
If L admits infinitely many special factors then L admits a special factor of length n 
because every suffix of a special factor is also a special factor, and thus p is not injective on 
F n+ \ (L) . Hence, if L is extendable and admits infinitely many special factors then p induces 
a non-bijective surjection from F n+ i(L) onto F n (L), and thus inequality p(n + 1) > p(n) 
holds. □ 

Exercise 3. Prove that the complexity function of any extendable language is either 
increasing or FIATC. 

Exercise 4. For each n G N, let s(n) denote the number of n- length special factors of L. 

1. Let a denote the cardinality of A. Prove that p(n + 1) —pin) < (a — l)s(n) for every 
hGN. 

2. Prove that if L is extendable then p(n + 1) — p(n) > s(n) for every n G N. 

Lemmas [T]and [2] can be easily deduced from questions 1 and 2 of Exercise HI respectively 

Lemma 3. Let X and Y be two languages such that Y is finite and non-empty. The 
complexity function of XY is bounded if, and only if, the complexity function of X is 
bounded. 

Proof. Let /, g and h denote complexity functions of X, Y, and XY, respectively We 
have to prove that h is bounded if, and only if, / is bounded. 

Since Y is non-empty, every factor of X is also a factor of XY, and thus / is bounded 
from above by h. The "only if part" follows. 

Let us now prove the "if part" . The inclusion 

n 

F n (XY) C |J F n _ k (X)F k (Y) , 

k=0 

yields the inequality 

n 

h(n)<J2f(n-k)g(k). (1) 

k=0 

Let M — sup ng p^ f( n ) and S — SneNf^)' Observe that S < oo. Assume that / is 
bounded. Then we have M < oo and it follows from Equation ([1]) that 

n 

h(n) < M^g(k) < MS . 

k=0 

Hence, MS is a finite upper bound for h. □ 
We can now prove Theorem [TJ 
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Proof of Theorem [IJ For each k G N, let denote the set of all w G A* such that wA k D 
L 7^ 0. Let L' denote the set of all iw G A* such that wis a factor of L^ for every A; G N. Let 
denote the complexity function of L', and for each k G N, let pfc denote the complexity 
function of Since each word in V is a factor of Lo = L, p' bounds p from below. 
Therefore, if p' is increasing then for every n G N, it holds that p(n) > p'(n) > n. It 
remains to show that if p' is not increasing then p is bounded. 

Claim 1. For any i, j G N with i < j, all factors of Lj are factors of Li. 

Proof. Each word in Lj is a prefix of a word in Lf for each w G Lj, there exists x G A 7-2 
such that wx G Lj. □ 

Claim 2. TTie language V is extendable. 

Proof. For each w G Lfe+i, there exists a & A such that too G Therefore, for each factor 
w of Lfc+i, there exists a & A such that u>a is a factor of Let w G L'. For each k G N, 
let 6 i be such that waj, is a factor of L^. The finite alphabet A contains a letter a 
such that at = a for infinitely many A; G N. Therefore, wa is a factor of Lk for infinitely 
many fceN. It now follows from Claim [1] that G V . □ 

Claim 3. I/pfc is bounded for some k G N #ien p is bounded. 

Proof. Remark that L is a subset of Lfcv4 fc U {w G L : \w\ < k}. Since {w G L : |«;| < k} 
is finite, p is bounded whenever the complexity function of L^A k is bounded. Besides, 
Lemma [3] (applied with X = L^ and Y = A k ) ensures that the complexity function of 
LkA k is bounded whenever p^ is bounded. □ 

Claim 4. For each nGN, there exists fceN swc/i i/iat F n (Lk) C L'. 

Proof. Claim [1] ensures 

Fn(L ) ^ F n (Li) D F n (L 2 ) D F n (L 3 ) D ■ ■ ■ 
and since all sets are finite there exists & G N such that 

F n {Lk) = F n (Lk+i) = F n (Lk+2) = F n (Lk+z) — ■ ■ ■ 

□ 

Assume that p' is not increasing. Then, combining Claim [2] and Lemma [21 we get that 
V only admits finitely many special factors. Let n G N be greater than the length of every 
special factor of L' . By Claim HI there exists fceN such that F n (Lk) C L'. Clearly, has 
no special factor with length n — 1 or more, and thus pk is eventually constant by Lemma [H 
It now follows from Claim [3] that p is bounded. □ 

Exercise 5. Prove Theorem [2] (Hint: use Exercise [2]) . 
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